Skip to content

Coalesce scale reads for non-multiple of block N#1030

Open
xintin wants to merge 15 commits intomainfrom
xintin/coalesce_b_scale_nonmultiple_blkN
Open

Coalesce scale reads for non-multiple of block N#1030
xintin wants to merge 15 commits intomainfrom
xintin/coalesce_b_scale_nonmultiple_blkN

Conversation

@xintin
Copy link
Contributor

@xintin xintin commented Mar 3, 2026

  1. Allows bounded reads (where N % block_N != 0) to participate in read coalescing, using precomputed mask expressions to zero out-of-bounds lanes.
  2. Relaxes the opsel scale coalescing from requiring exactly 4 members to accepting partial groups, which is needed for preshuffle scales when N isn't a multiple of block_N.

xintin added 3 commits March 2, 2026 20:57
Signed-off-by: xintin <gaurav.verma@amd.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin force-pushed the xintin/coalesce_b_scale_nonmultiple_blkN branch from 99133ff to f6531d4 Compare March 3, 2026 05:55
Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin force-pushed the xintin/coalesce_b_scale_nonmultiple_blkN branch from f6531d4 to 1a5f07e Compare March 3, 2026 06:07
xintin and others added 5 commits March 3, 2026 06:50
Signed-off-by: xintin <gaurav.verma@amd.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin changed the title [wip] coalesce blk N where N%blk_N!=0 Coalesce scale reads for non-multiple block N Mar 3, 2026
@xintin xintin changed the title Coalesce scale reads for non-multiple block N Coalesce scale reads for non-multiple of block N Mar 3, 2026
@Hardcode84 Hardcode84 self-requested a review March 4, 2026 17:23
Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin requested a review from Hardcode84 March 4, 2026 17:33
lo_custom.vector_shapes
)
lo_mask = _flatten_bounds_to_mask_expr(lo_custom, symbolic_shape)
if lo_mask is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we attach new mask to the new read op?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two original reads (lo and hi) can have different bounds checks. Each one has its own index and bounds. After coalescing, there is only one merged read op that loads a wider vector covering both lo and hi elements. We can't attach two different masks to that single read because each slice corresponds to one of the original reads and need that original read's bounds condition applied independently.

@xintin xintin requested a review from harsh-nod March 4, 2026 18:03
Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin force-pushed the xintin/coalesce_b_scale_nonmultiple_blkN branch from cba078c to f83061e Compare March 4, 2026 18:10
xintin and others added 3 commits March 4, 2026 10:13
Signed-off-by: xintin <gaurav.verma@amd.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin force-pushed the xintin/coalesce_b_scale_nonmultiple_blkN branch 2 times, most recently from 014a531 to 48d39d6 Compare March 4, 2026 21:15
Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin force-pushed the xintin/coalesce_b_scale_nonmultiple_blkN branch from 48d39d6 to 8c89da2 Compare March 4, 2026 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants